Pattern recognition system with statistical classification

ABSTRACT

A pattern recognition system is described. During training, multiple training input patterns from multiple classes of subjects are grouped into clusters within categories by computing correlations between the training patterns and present category definitions. After training, each category is labeled in accordance with the peak class of patterns received within the cluster of the category. If the domination of the peak class over the other classes in the category exceeds a preset threshold, then the peak class defines the category. If the contrast does not exceed the threshold, then the category is defined unknown. The class statistics for each category are stored in the form of a training class histogram for the category. During testing, frames of test data are received from a subject and are correlated with the category definitions. Each frame is associated with the training class histogram for the closest correlated category. For multiple-frame processing, the histograms are combined into a single observation class histogram which identifies the subject with its peak class within a predefined degree of confidence. In a multiple-channel configuration, the training patterns and testing patterns are divided into multiple features.

GOVERNMENT FUNDING

This invention was made with government support under Contract NumberF19628-90-C-002 awarded by the Air Force. The Government has certainrights in the invention.

RELATED APPLICATIONS

This application is the U.S. National Phase of International ApplicationNo. PCT/US94/10527, filed on Sep. 16, 1994, which is aContinuation-in-Part application of U.S. Ser. No. 08/122,705 filed onSep. 16 1993 now U.S. Pat. No. 5,537,488 the teachings of which areherein incorporated in their entirety by reference.

BACKGROUND OF THE INVENTION

Pattern recognition systems automatically identify patterns of inputdata based on patterns previously received. A system is first trained byinputing multiple training patterns and forming categories. Aftertraining, the system receives a pattern for identification. It comparesthe pattern with the categories and, based on the comparisons,identifies the pattern as belonging to one of the categories.

SUMMARY OF THE INVENTION

The present invention is directed to a pattern recognition system and amethod for recognizing input data patterns from a subject andclassifying the subject. The system first performs a training operationin which the system generates a set or library of categories. During thetraining operation, input training patterns are received and groupedinto clusters. Each cluster of training patterns is associated with acategory having a category definition based on the training patterns inthe cluster. As each training pattern is received, a correlation ordistance is computed between it and each of the existing categories.Based on the correlations, a best match category is selected. The bestmatch correlation is compared to a preset training correlationthreshold. If the correlation is above the threshold, then the trainingpattern is added to the cluster of the best match category, and thedefinition of the category is updated in accordance with a learning ruleto include the contribution from the new training pattern. If thecorrelation is below the threshold, a new category defined by thetraining pattern is formed, the cluster of the new category having onlythe single training pattern.

Training patterns are usually received from multiple classes ofsubjects. A class is a particular item or person which the network istrained to identify. A category is defined by particular features orviews of the subjects. For example, if the system is used to visuallyclassify automobiles by model, each model of automobile would be aseparate class. Specific recognizable features of the automobiles, suchas fenders of a particular shape or particular tires, could becategories. Since different models (classes) can have similar appearingfenders or tires (categories), the clusters of each category willgenerally include training patterns from more than one class. That is,since the fenders of different models may appear similar, the cluster ofa fender category will include training patterns from more than onemodel. When training such a system, multiple photographs of each model(class) taken from different views and/or showing different featurescould be input to the system to form the categories.

In the case of face recognition, each class would be a separate person.The categories could be defined according to particular vieworientations or facial features. To train a face recognition system,photographs showing several views of each person of a group of personscould be input to the system. The views could include front, left sideand right side views as well as views of individual persons with andwithout glasses and/or with and without facial hair. Since more than oneperson can appear similar from a particular view or with a particularfacial feature, each view or feature category can include severalpersons (classes).

Just as several classes can have similar appearing features and will asa result be grouped in feature clusters, features of different classescan also appear very different. As a result, different categories ofcorresponding features will be formed for different classes. Forexample, fenders or tires from different models of automobile may appearvery different. So, multiple tire categories and fender categories canbe formed, each containing training patterns from different models inits cluster. In the same way, different persons will likely appeardifferent even at the same orientation. So, multiple categories will beformed for a single orientation.

It should also be noted that within a single class, multiple views,although taken from the same orientation, may appear in differentcategories. For example, several views of a single person taken from thefront orientation may appear different enough to cause them to begrouped into different clusters. This is caused by many factors such asslight fluctuations in facial expression, lighting, etc.

To label categories, the system of the invention counts the number oftraining patterns of each class within the pattern cluster of eachcategory. It uses the counts to generate a training class histogram foreach category which shows the number of training patterns of each classwithin the category's cluster. The class in the category with thehighest number of training patterns is termed the "peak class" of thecategory and represents the maximum point in the category's trainingclass histogram. The system uses the training histograms of thecategories to assign labels to the categories. It computes a categoryclass contrast which indicates the degree to which the peak classdominates the other classes in the category. If the contrast exceeds apreset category labeling threshold, the category is labeled as the peakclass. If it does not exceed the threshold, the category is labeledunknown.

Thus, the network defines a group of categories each of which is labeledin accordance with the number of patterns of a given class of patternwithin the pattern cluster of the category. The category definitionincludes a training class histogram which indicates the probability ofoccurrence of a training pattern within the peak class of the category.

After training and labeling categories, the system of the invention canbegin its testing operation during which frames of data in the form oftest patterns from a subject can be associated with the categories suchthat the subject can be classified within a class. The processing of alltest data from a subject to be classified is termed an "observation."Observations can comprise a single-frame test pattern of test data ormultiple frames such as a video tape of a subject.

In multiple-frame observations, the test pattern frames are processedone at a time. As each test pattern is received, a correlation ordistance between it and each of the category definitions is computed.The peak class of the best match category, along with the correspondingtraining class histogram, are associated with the test pattern. When allof the frames have been received, the individual training classhistograms are combined to form a single observation class histogram.

In the preferred embodiment, the observation class histogram is formedby accumulating the training class histograms associated with each testdata frame from the subject. The accumulation of histograms is performedaccording to an accumulation rule similar to the learning rule used todefine the categories during the training operation. After the firstdata frame is received, the observation class histogram becomes thetraining histogram for the category associated with the data frame.Training histograms associated with subsequent data frames are added tothe observation class histogram according to the accumulation rule. Toupdate the observation histogram to include the histogram for a new testpattern, the pattern counts for each class in the current observationclass histogram are subtracted from corresponding class counts of theincoming new histogram. The differences in counts are each multiplied bythe accumulation rate γ. This product is then added to the correspondingclass pattern counts for the current observation class histogram toobtain the updated observation class histogram.

In another embodiment, the individual training histograms areaccumulated to create the observation class histogram by simply addingthe individual histograms together. In another embodiment, thecontribution of each class to the observation histogram is determined byscaling by 100 the cluster class contrast for the cluster containingeach class. The resulting scaled histograms are then simply addedtogether to form the observation class histogram. In another embodiment,only the peak class of each category is used. The contribution of thepeak class to the observation histogram is calculated by scaling theclass contrast of the category by 100. In still another embodiment, thehistograms are not actually accumulated. Rather, a simple count iscomputed of the number of test data frames associated with each peakclass to determine the contribution of that class to the observationhistogram. As each test frame is received, it is identified as beingwithin the peak class of the category with which it is associated.During the observation, a running count is kept of the number of framesidentified with each class. The observation class histogram is simply aplot of the counts for each peak class made during the observation.

After the observation class histogram is generated, an observation classcontrast is computed as a measure of the domination by the peak class ofthe observation histogram. If the contrast exceeds a presetclassification threshold, the subject is classified as being the peakclass. If the contrast is below the threshold, the subject is classifiedas unknown. Also, if the peak class of the observation class histogramis an unknown class, the subject is classified as unknown.

For single-frame observations, the correlation or distance between thetest frame and each of the category definitions is computed. The peakclasses of a preselected number k of categories having the highestcorrelations with the test pattern is then assembled into a "k-nearestneighbor" observation class histogram. The contribution of each class tothe k-nearest neighbor histogram is determined by scaling the classcontrast of the category from which the class was taken. A k-nearestneighbor observation class contrast is computed which indicates thedegree to which the peak class in the k-nearest neighbor observationhistogram dominates the other classes. If this contrast is above apredetermined classification threshold, the test pattern is classifiedas the peak class. If it is below the threshold, the test pattern isclassified as unknown. As in the multiple-frame case, if the peak classof the observation histogram is an unknown class, the test pattern willbe classified as unknown.

In one embodiment, the system classifies subjects by processing patternsof multiple data types from the subject, for example, video and voicedata. In this embodiment, multiple pattern recognition systems are used,one system for each data type. Each of the systems operates individuallyduring training and category labeling to set up its own set ofcategories. During the testing operation, each network receives datapatterns of its corresponding data type, associates its patterns withcategories and corresponding class histograms, and generates anobservation class histogram accordingly. The system then fuses thedata-type-specific observation class histograms, i.e., the voiceobservation class histogram and the video observation class histogram,to generate a cumulative final decision class histogram. The peak classcontrast for the cumulative final decision class histogram is computed,and, if it is above a threshold, the subject is classified as the peakclass of the cumulative histogram. Thus, the system combines the patternrecognition of both voice and visual data into one cumulative subjectclassification having a higher degree of confidence than either of thetwo individual networks.

The pattern recognition system of the invention can also operate in amultiple-channel mode and configuration. In the multiple-channel mode,each training pattern and test pattern includes plural individualfeature patterns defining individual features of subjects. Theindividual features of the patterns are identified by a featureextraction process. The features can be separate portions of a singleimage, such as a nose feature, a mouth feature and eye features in aface pattern. Alternatively, each feature can be a different view of thesame subject, i.e., a front view of an automobile and a side view of anautomobile.

In the multiple-channel mode, each category definition vector is alsomade up of individual feature definitions. As in the single-channelmode, during training, as each training pattern is received, acorrelation is computed between the training pattern and each categorydefinition vector. In the multiple-channel mode, the correlation iscomputed by first computing individual feature correlations between eachfeature in the training pattern and the corresponding feature definitionin the category definition vector. In the preferred embodiment, thelowest of the feature correlations in a category is taken as the overallcorrelation between the training pattern and the category definitionvector. When each of these overall category correlations is so computed,the incoming training pattern including all of its individual featurepatterns is added to the cluster whose category definition vector hasthe closest correlation.

In multiple-channel testing, each incoming test pattern also includesmultiple feature definitions. A correlation between a test pattern andeach category is computed as in the multiple-channel training mode. Thatis, each feature pattern in the input testing pattern is compared toeach individual feature definition within each category definitionvector to obtain multiple feature correlations for each category. Thelowest of the feature correlations for a category is taken to be theoverall correlation between the test pattern and the category. Thecategory with the closest overall correlation with the test pattern isassociated with the test pattern as its best match category.

The multiple-channel configuration can provide an accurateclassification where the subjects being classified have similar lookingfeatures. For example, in a face recognition system, two differentpersons can have similar noses and eyes but have very different mouths.In a single-channel system, the similarity between noses and eyes may besufficient to provide a high correlation and may result in an incorrectclassification. However, in a multiple-channel mode, theoverallcorrelation will be based on the lowest feature correlation, in thiscase, the mouth. Since the mouths of the two persons are different, themultiple-channel system will not group the persons together. Thus, theuse of multiple channels reduces the possibility of an incorrectclassification.

The present invention provides numerous advantages in patternrecognition and subject classification. The system combines the learningfeatures of adaptive pattern recognition systems such as neural networkswith statistical decision making to perform its classifications. Thedefinition of categories during training, the labeling of the categoriesand the output classifications are all performed in terms of histograms.Thus, the classifications are associated with a probability of correctclassification. This provides the user of the system with an indicationof the degree of confidence associated with each classification. Thistype of statistical classification can be more useful to the user thanthe "hard" classification of other systems.

Also, the statistical methods of the present invention allow forclassifying patterns as unknown. If the degree of confidence in aclassification is lower than a preset desired level, the subject islabeled as unknown. This feature is important in many applications suchas medical diagnostic testing where it is more desirable that aquestionable item of test data be classified as unknown than bemisidentified.

Also, the system can be incrementally trained. As described above, aftera set of training patterns is input to train the system, it can beginclassifying subjects. If new training data becomes available, the systemcan be trained on the new data without completely retraining the system.The new data is input during a new training operation. The new data willresult in new categories being formed or in modification of thedefinitions of the old categories. This can save considerable processingtime when there are many training input patterns.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, features and advantages of theinvention will be apparent from the following more particulardescription of preferred embodiments of the invention, as illustrated inthe accompanying drawings in which like reference characters refer tothe same parts throughout the different views. The drawings are notnecessarily to scale, emphasis instead being placed upon illustratingthe principles of the invention.

FIG. 1 is a schematic illustration of a class histogram in accordancewith the present invention.

FIGS. 2A-2E are training class histograms in accordance with the presentinvention, each of which is associated with a single frame of input testdata.

FIGS. 3A-3E show the process of accumulating training class histogramsto create the observation class histogram in accordance with the presentinvention.

FIGS. 4A-4E show five nearest neighbor training class histograms for asingle frame of input test data.

FIG. 5 shows the k-nearest neighbor class histogram created by combiningthe training histograms of FIGS. 4A-4E in accordance with the presentinvention.

FIG. 6 is a functional block diagram of the data fusion configuration ofthe present invention.

FIG. 7 is a functional block diagram of the system of the presentinvention in the Training configuration.

FIG. 8A is a functional block diagram of the Distance Metric Module ofthe present invention.

FIG. 8B is a functional block diagram of the Data Shift Submodule withinthe Distance Metric Module of the present invention.

FIG. 8C is a functional block diagram of the Distance Metric Submodulewithin the Distance Metric Module of the present invention.

FIG. 9 is a functional block diagram of the Best Match Module of thepresent invention.

FIG. 10 is a functional block diagram of the Distance Decision Module ofthe present invention.

FIG. 11 is a functional block diagram of the Add New Category Module ofthe present invention.

FIG. 12 is a functional block diagram of the Category Adjust Module ofthe present invention.

FIG. 13 is a functional block diagram of the system of the presentinvention in the Category Labeling configuration.

FIG. 14 is a functional block diagram of the Cluster Class AccumulatorModule of the present invention.

FIG. 15 is a functional block diagram of the Category Label AssignmentModule of the present invention.

FIG. 16 is a functional block diagram of the Class RAM Update Module ofthe present invention.

FIG. 17 is a functional block diagram of the system of the presentinvention in the Testing configuration.

FIG. 18 is a functional block diagram of the system of the presentinvention in the multiple-channel Testing configuration.

FIG. 19 is a functional block diagram of the Multiple-Channel DistanceMetric Module of the present invention.

FIG. 20 is a functional block diagram of the Channel Accumulator Moduleof the present invention.

DETAILED DESCRIPTION OF THE INVENTION

When the system of the present invention is trained, it receivestraining data patterns from various subjects or classes. In the case ofa face recognition system, these patterns may include photographs ofindividual persons from several different orientations and/or exhibitingseveral different facial expressions. Photographs may also be shown ofsubjects with and without eyeglasses, with and without facial hair, etc.Voice data from different persons (classes) can also be received. Asanother example, in the case of a system used to identify semiconductorwafer defects, visual images of different types of defects as well asimages of wafers having no defects can be received for training.

Each training pattern is associated with a known class and takes theform of a feature pattern vector I_(INP). Each category definition I_(k)is expressed in a vector format compatible with the feature vector. Aseach pattern vector is received, a correlation C_(TRN) between it andeach existing category definition is performed. In the case of a facerecognition system, the correlation is computed according to ##EQU1##where C_(TRN) is the training correlation,

I_(INP) is the input feature vector,

I_(k) is the present category definition vector, and

I_(INP) I_(k) is the vector dot product of I_(INP) and I_(k).

The correlation C_(TRN) is then compared to a preset training thresholdλ_(TRN). If a category is found for which the correlation C_(TRN)exceeds the threshold λ_(TRN), then the training pattern is added to thecluster of that category, and the definition vector I_(k) of thatcategory is modified to incorporate the effects of the feature vectorI_(INP) of the input pattern. If more than one category has acorrelation C_(TRN) above the threshold λ_(TRN), I_(k) for the bestmatch category, i.e., the category with the highest correlation, ismodified and the training pattern is added to the cluster of thatcategory. I_(k) is modified in accordance with a learning rule given by

    I.sub.k.sup.NEW =I.sub.k.sup.OLD +α(I.sub.INP -I.sub.k.sup.OLD)(2);

where

I_(k) ^(NEW) is the resulting category definition vector aftermodification,

I_(k) ^(OLD) is the category definition vector before modification,

α is a user defined "learning rate" which governs the rate of clusterupdate, and

I_(INP) is the input feature vector.

For α=1, I_(k) ^(NEW) =I_(INP), i.e., the input vector replaces thecategory definition vector; and for α=0, I_(k) ^(NEW) =I_(k) ^(OLD),i.e., no category definition update occurs. A high α value indicatesfast learning, and a low α value indicates slow learning. Typically, thelearning rate α is set to a value between 0.2 and 0.5 to suppress smallvariations due to noise and shifts in the input training patterns and tosimultaneously reinforce consistent features in the input vectors.

If after computing correlations C_(TRN) for each category, nocorrelation is found to exceed the training threshold λ_(TRN), then anew category is formed. The definition vector I_(k) of the new categoryis identical to the feature vector I_(INP) of the input trainingpattern, and the cluster of the new category contains only the singleinput pattern.

The cluster of each category is divided into memory bins, each of whichis identified with a specific class. As each input training patterncomes into a cluster, it is stored in the bin which is associated withits class. For example, if a training pattern from a certain person B isassociated with a best match category 5, the input pattern is stored inthe person B bin of the cluster of category 5 and the pattern count forthe bin is incremented. In each category, the class with the highest bincount is the peak class of that category.

Some category definitions change many times as patterns of training dataare received. By the time the end of the training set is reached, somecategory definition vectors are substantially different from what theywere at the beginning of the set. The variations can be so great thatsome patterns from the beginning of the training set could be includedin clusters other than those in which they are stored if they werereceived again for training. In order to eliminate this condition andstabilize the category definitions, the training set can be repeatedlyreprocessed until none of the training patterns change clusters. In apreferred embodiment, the training set will usually be processed no morethan five times.

After the set of training patterns has been completely processed asdescribed above to form the clusters, each category is labeled with aclass name. In the preferred embodiment, the categories are labeledaccording to the class bin counts. A category may be labeled as its peakclass or it may be labeled an unknown class depending upon the degree towhich the peak class dominates the cluster of the category. To make thisdetermination, a cluster class contrast CC_(C) is computed for eachcategory by taking the number of training patterns for the peak classN_(PEAK) and subtracting the average number of patterns for theremaining classes N_(MEAN). The result is normalized by dividing thisdifference by the total number of patterns N_(TOTAL) in the cluster.That is, ##EQU2## Thus, the class contrast for a cluster formed from asingle class will be unity. The class contrast for a cluster in which nosingle class dominates will be close to zero, since the differencebetween the peak and the mean will be very small.

After the entire training set has been processed, the class bin countsfor each cluster are plotted to form a training class histogram for eachcategory. FIG. 1 schematically depicts a typical category training classhistogram. This category was the best match category for a total of 200training patterns. Of these 200 patterns, 60 were from class A, 40 werefrom class B and 150 were from class C. Class C is the peak class of thecategory; and N_(PEAK) =150, N_(MEAN) =50 and N_(TOTAL) =250. Therefore,the class contrast CC_(C) is given by ##EQU3##

After the class contrast CC_(C) is computed for a cluster, it iscompared to a preset cluster class contrast labeling threshold λ_(L). Ifthe contrast CC_(C) exceeds the threshold λ_(L), then the category islabeled with the name or identification of the peak class of thecluster. If the contrast CC_(C) does not exceed the threshold λ_(L), thecategory will be labeled as unknown.

As previously stated, in the class histogram of FIG. 1, the classcontrast CC_(C) =0.4. If the class contrast labeling threshold λ_(L) isset at 0.3, then this category will be labeled as class C. In that case,during subsequent testing, a test pattern for which the category is thebest match category will be associated with class C. On the other hand,if the class contrast threshold is set at 0.6 for example, the categorywill be labeled as unknown. In that case, during testing, if thecategory is the best match category of an input test pattern, that testpattern will be identified as unknown.

It can be seen from the foregoing description that the statisticalmethods employed in the invention provide an indication of the degree ofcertainty associated with pattern classifications. By properly choosingthe training and labeling thresholds λ_(TRN) and λ_(L), one can assurethat the system is trained in a fashion appropriate for the application.For example, setting a very high training threshold λ_(TRN) will causemany categories to be formed during training, resulting in a look-uptable type of classification during testing with a high degree ofconfidence in the classification. A low training threshold λ_(TRN) willresult in fewer categories which contain more classes of trainingpatterns. The class contrast labeling threshold λ_(L) is set accordingto the amount of certainty required in a classification before a testpattern can be associated with a particular class. If a category has ahigh labeling threshold λ_(L) and that category is labeled with a classname instead of being labeled unknown, then that class exhibits a strongdomination of the category. An incoming test pattern which correlateswith the category will most likely belong to the dominating class. Thus,a high class contrast labeling threshold λ_(L) indicates a high degreeof confidence in the classification.

After training and category labeling are complete, the system is readyto classify input test patterns during the testing operation. Test datacan consist of multiple frames of data such as a real-time videoobservation or it can be a single data frame such as a snapshot of thesubject.

In the case of multiple-frame observation testing, a testing correlationC_(TEST) similar to the training correlation C_(TRN) is computed betweeneach individual frame of testing data and each of the categorydefinitions. A best match category having the highest correlation withthe test frame pattern is selected. If correlation C_(TEST) of the bestmatch category is above a pre-set testing correlation thresholdλ_(TEST), then the input frame is associated with the best matchcategory and its corresponding training class histogram generated forthe category during training. When the observation is complete, eachframe of data is associated with a single training class histogram.These training class histograms are combined to form a singleobservation class histogram associated with the input subject.

In a preferred embodiment, the observation class histogram is a weightedaccumulation of the training class histograms associated with the dataframes. The training histograms are accumulated in the observationhistogram as the data frames are processed by updating the observationhistogram according to an accumulation rule given by

    H.sub.NEW =H.sub.OLD +γ(H.sub.CURRENT -H.sub.OLD)    (4);

where

H_(NEW) is the observation histogram after being updated according tothe accumulation rule,

H_(OLD) is the observation histogram before being updated,

H_(CURRENT) is the training histogram associated with the current dataframe being added to the observation histogram, and

γ is the accumulation rate.

H refers to the bin counts of training patterns in each class of each ofthe histograms. That is, referring to FIG. 1 for example, H for thishistogram would represent the 150 patterns of class C, the 60 patternsof class A and the 40 columns of class B. So, when an existingobservation histogram H_(OLD) is updated according to the accumulationrule, the bin counts for each class in the histogram are subtracted fromthe bin counts of the same classes in the current training histogramH_(CURRENT). The difference in bin counts for each individual class ismultiplied by the accumulation rate γ. The weighted bin countdifferences for each of the classes are then added to theircorresponding class bin counts in the existing observation histogramH_(OLD) to obtain the updated observation histogram H_(NEW). When thelast test pattern from the subject is received, the last associatedtraining histogram is accumulated within the observation class histogramto form the final observation class histogram.

FIGS. 2A-2E and 3A-3E illustrate the process of accumulating individualtraining class histograms from frames of input test data to generate theobservation class histogram. FIGS. 2A-2E each represent the trainingclass histogram for a category which is associated with a single frameof input test data. For simplicity, the data frames will be referred toas frames 1-5, with FIG. 2A representing the histogram for frame 1, FIG.2B representing the histogram for frame 2, and so on. The categories arearbitrarily labeled categories 7, 1, 4, 8 and 2. It will be understoodthat each frame need not be associated with a unique category as in thisillustration. It is likely that in actual testing, multiple data frameswill be associated with a single category.

Frame 1 of test data shown in FIG. 2A was associated by the system asdescribed above with category 7. When the system was trained, a total of200 training patterns were stored in the category 7 cluster. Of thesepatterns, 100 were from class A, 60 were from class B and 40 were fromclass C. The cluster class contrast CC_(C) computed from equation 3above is given by ##EQU4## For the purpose of this illustration, it isassumed that during training the cluster class contrast labelingthreshold λ_(L) was set at 0.200. Therefore, the class contrast forcategory 7 exceeded the labeling threshold, and consequently, category 7was labeled as class A, the peak class of the category.

FIG. 2B shows that frame 2 of the test data from the subject isassociated with category 1. The cluster class contrast CC_(C) ofcategory 1 is 0.423 which is greater than λ_(L). Therefore, category 1was labeled as class D. FIG. 2C indicates that frame 3 is associatedwith category 4 having a class contrast CC_(C) of 0.307 and beinglabeled as class A. As shown in FIG. 2D, frame 4 of the test data isassociated with category 8. Category 8 contains three classes A, B and Cand has a cluster class contrast of 0.100, below the labeling thresholdλ_(L) of 0.200. Therefore, category 8 is labeled as unknown. As shown inFIG. 2E, frame 5 of the test data was associated with category 2, havinga cluster class contrast CC_(C) of 0.318 and therefore being labeled asclass E.

As described above, the observation class histogram is created byaccumulating each of the individual training histograms shown in FIGS.2A-2E in accordance with the accumulation rule in equation (4). FIGS.3A-3E illustrate the stages of accumulation of the observation classhistogram as the data frames are received by the system. That is, FIG.3A shows the observation class histogram after frame 1 is received; FIG.3B shows the observation class histogram after frame 2; and so on.Finally, FIG. 3E shows the final observation class histogram in whichall five individual training histograms from the five data frames areaccumulated.

After the first frame of data is received, the observation histogram isidentical to the training histogram for the category associated withframe 1. This is shown in FIG. 3A in which the observation histogram isthe same as the training histogram for frame 1 shown in FIG. 2A.

When frame 2 is received, contributions for all of the classes in thefirst two training histograms are computed from the accumulation rule inequation 4. For this illustration, the accumulation rate γ is chosen tobe 0.2. The class contributions in the updated observation classhistogram are computed by ##EQU5## The observation class histogramgenerated from the above bin counts after frame 2 is shown in FIG. 3B.

When frame 3 is received, the category 4 training histogram as shown inFIG. 2C is accumulated within the observation class histogram. Theresulting histogram is shown in FIG. 3C. The heights of the classindicators are determined by the following. ##EQU6##

Frame 4 was associated with category 8, which is labelled unknown.Nevertheless, the histogram is combined with the observation histogramin the same fashion as histograms for categories with class labels. FIG.3D shows the observation histogram after the category 8 traininghistogram is added. The heights of the class indicators are computed asfollows. ##EQU7##

When frame 5 is processed, the final training histogram associated withcategory 2 is accumulated within the observation class histogram. Thecompleted observation class histogram is shown in FIG. 3E. The heightsof the class indicators are computed by the following. ##EQU8##

Other approaches may also be used to form the observation histogram. Forexample, instead of using the accumulation rule to combine theindividual training histograms, they may simply be added together. Thebin count for each class in the observation histogram would be the totalof the bin counts for that class from all of the training histograms.For the illustration described above, the individual class contributionsto the observation histogram would be calculated as follows. ##EQU9##

Another method of forming the observation class histogram involves usingthe cluster class contrast CC_(C) of each category associated with thedata frames. In this method, only the peak class from each of the 5categories is used. The contribution of the peak classes to theobservation histogram is calculated by scaling the cluster classcontrast CC_(C) of the peak classes category by 100 and adding all ofthe scaled contrasts together. For example, in the illustration above,the individual class contributions will be calculated as follows.##EQU10## The observation histogram so formed would have only fourclasses, namely, A, D, E and Unknown. Classes B and C would not beincluded in the observation class histogram because they were not peakclasses of any category associated with any of the test frames. Itshould also be noted that the contribution of frame 4 is added to theunknown class rather than class A since the class contrast CC_(C) wasbelow the labeling threshold.

A fourth method of producing the observation class histogram alsoinvolves using the cluster class contrast CC_(C) for the categories. Inthis method, though, all of the classes from each category are used inthe observation class histogram instead of only the peak class. Ratherthan simply scaling the peak class by the class contrast CC_(C) as inthe previous method, the entire histogram, i.e., all of the bin countsfor all classes in the category, are multiplied by the class contrast ofthe category and accumulated within the observation class histogram. Inthis method, the class contributions to the observation class histogramare calculated as follows. ##EQU11##

In a fifth method of producing the observation class histogram, thecontribution of a class to the observation histogram is simply thenumber of frames of test data which are associated with a categoryhaving that class as its peak class. That is, as a test frame isreceived, it is associated with a category which has a peak class orunknown label. A running count is made for each class of the number offrames received for that class. When all of the test frames have beenreceived, the height of each class indicator in the observationhistogram is simply the number of test data frames received for thatclass. For example, in the illustration above, A=2, B=0, C=0, D=1, E=1and Unknown=1.

After the observation histogram is created, an observation classcontrast CC₀, analogous to the cluster class contrast CC_(C), iscomputed. The observation class contrast CC₀ is an indication of theextent to which the peak class of the observation histogram dominatesthe other classes. It is computed by taking the contribution of the peakclass, subtracting the mean of the contributions from the other classesand dividing that difference by the total of the class contributions.For the observation histogram of FIG. 3E, ##EQU12## If this numberexceeds a preset observation class contrast classification thresholdλ_(CL), then the subject is classified as being class A, the peak classof the observation histogram. If it does not exceed the thresholdλ_(CL), then the subject is classified as unknown.

The pattern recognition system of the present invention can alsoclassify subjects when only a single frame of test data is received.During testing, the system calculates the test correlation C_(TEST)between the input pattern and each of the categories. The categories arethen sorted according to their correlation to the input pattern. Apre-set number k of categories having the closest correlations to thetest pattern are then used to form a "k-nearest neighbor" classhistogram. The peak class from each of the k categories is included inthe histogram. The contribution of each class is determined by thecluster class contrast CC_(C) of its category scaled by 100. It shouldbe noted that only those categories which have a correlation C_(TEST)with the input pattern above the testing threshold λ_(TEST) are includedin the k-nearest neighbor histogram, even if that means that fewer thank categories will be included. Also, in the case of k=1, the subject issimply identified as the class of the best match category.

FIGS. 4A-4E and FIG. 5 illustrate the process of generating thek-nearest neighbor histogram. In this illustration, K=5. FIGS. 4A-4Eshow the training histograms for the 5 categories having the closestcorrelations with the single input test frame. FIG. 5 shows thek-nearest neighbor histogram which results from combining the traininghistograms of FIGS. 4A-4E. The contribution of each class to theobservation histogram is computed by scaling by 100 the cluster classcontrast CC_(C) of the category of which the class is the peak class.The individual class contributions for the illustration of FIGS. 4A-4Eare calculated as follows. ##EQU13## The results of these calculationsgive the heights of the class indicators in the k-nearest neighborhistogram of FIG. 5.

A k-nearest neighbor class contrast based on the peak class in k-nearestneighbor histogram is then calculated. If this class contrast exceeds apreset threshold, then the subject is classified as the peak class inthe output histogram. Otherwise, the subject is reported as unknown. Theclass contrast for the k-nearest neighbor histogram of FIG. 5 is givenby ##EQU14##

The invention described to this point has involved a single patternrecognition system which classifies a single type of data from thesubject, for example, visual image data of a person's face. In manyapplications though, it is desirable to have another type of data fromthe same subject to confirm or to increase the confidence level in theclassification. For example, it would be beneficial to be able tosimultaneously classify a person by video data and by voice data.

The system of the present invention allows results of classificationsfrom different data types to be fused into a single cumulativeclassification. This is accomplished by adding another complete patternrecognition system for each data type to be processed. FIG. 6 is afunctional block diagram of the system. Each of the systems 5 is trainedin accordance with the foregoing description on sets of trainingpatterns of its corresponding data domain type. In the case ofclassifying persons, one system may be dedicated to classificationsbased on visual data and would be trained by visual training datapatterns. A second system would be trained on voice data trainingpatterns.

During testing, visual data is input to the visual system and voice datais input to the voice-trained system. Each of the systems operatesindependently to classify a subject based on input test patterns of itsown particular data type. However, the individual systems do not producea final classification decision. Instead, each system outputs itsobservation class histogram to a processor 9 which combines theobservation histograms into a cumulative data-fused output histogram 6.In the preferred embodiment, this is done by simply adding thecontributions of each class in the individual observation histogramstogether. A cumulative class contrast is then computed. If the contrastexceeds a threshold, a final classification decision is made, and thesubject is labeled as the peak class in the cumulative histogram. If thecontrast is below the threshold, the subject is labeled as unknown.

This data fusion process of the invention does not require that all datatype pattern recognition systems be operating simultaneously. Rather,because of the statistical method behind the process, individual systemscan be turned on and off while the testing process continues. The systemis merely summing output histograms from the individual systems tostrengthen the confidence in the final decision. If one of thesehistograms is eliminated because, for example, the voice data hasstopped, the system output is not inhibited. Instead, only thecontribution from the voice system is eliminated. The system cancontinue to operate to classify input patterns without the contributionfrom the system which has been removed.

FIG. 7 is a functional block diagram of the pattern recognition systemof the present invention shown in the training configuration. Duringtraining, inputs to the system include the training threshold λ_(TRN),the training pattern data input, a training pattern data type indicationand the learning rate α. The category definition vectors are stored in acategory definition RAM 10. The total number of categories at any timeis represented by N, and the category definitions are labeled CD₁-CD_(N).

Each category definition is applied to a single Distance Metric Module(DMM) 12. Each of the DMMs 12 also receives the training data inputpattern and the data type signal which identifies the data as beingeither one-dimensional or two-dimensional. Each DMM 12 shifts the inputtraining pattern to several different positions and computes acorrelation or distance d_(i) between its corresponding categorydefinition vector and each shifted version of the input data. All of thecalculated distances d_(i) are input to a Best Match Module (BMM) 14.The BMM 14 outputs the number k of the category which most closelymatches the input training pattern and the correlation or distance d_(k)between the best match category definition vector and the training inputpattern vector.

A Distance Decision Module (DDM) 16 receives the user-defined trainingthreshold λ_(TRN) and decides if the distance d_(k) is below thethreshold. If d_(k) is not below the threshold λ_(TRN), the traininginput pattern and the number k of the best match category are forwardedto a Category Adjust Module (CAM) 18 which uses the user-definedlearning rate α to adjust the definition of the category k in accordancewith the input pattern. The CAM 18 then stores the new categorydefinition back in the category definition RAM 10. If the distance d_(k)is below the threshold λ_(TRN), then the training pattern data isforwarded to an Add New Category Module (ANCM) 20. The DDM 16 transmitsan Enable signal to the ANCM 20 to allow the data input pattern vectorto be written to the CD_(N+1) location of the category definition RAM10, thus defining a new category.

FIG. 8A is a functional block diagram of a Distance Metric Module (DMM)12. The DMM 12 receives as inputs the training data input, the data typesignal and the definition vector from a single category CD_(j). Thetraining data input and data type signals are input to a Data ShiftSubmodule (DSSM) 22, shown functionally in FIG. 8B. The data type signalis applied to AND gate 24. It assumes a logic 0 value fortwo-dimensional data and a logic 1 value for one-dimensional data. Iftwo-dimensional data is being used, the training data input is enabledthrough latch 30 to the two-dimensional wrap-around shifter 32. Ifone-dimensional data is used, latch 26 is enabled to apply the trainingdata input to the one-dimensional wrap-around shifter 28. Each of theshifters 28, 32 generates n shifted versions s_(i) of the training datainput pattern labeled s₁ -s_(n). Each of these shifted signals s_(i) isinput to a Distance Metric Submodule (DMSM) 34, shown functionally inFIG. 8C. Each DMSM 34 receives a single shifted training input datapattern and the category definition vector CD_(j) and determines thecorrelation or distance d_(ji) between the two vectors by computing theequation ##EQU15## The magnitude of the shifted data vector s_(i) iscomputed by vector matrix multiply (VMM) module 38 and square rootmodule 39. The magnitude of the category definition vector CD_(j) iscomputed by VMM 40 and square root module 41. The two magnitudes arethen multiplied together by multiplier 42. VMM 36 computes the vectordot product s_(i) ·CD_(j) between vector s_(i) and category definitionvector CD_(j) which is then divided at divider 43 by the productobtained at multiplier 42 to produce the output distance d_(ji). Thus,the output from a single DMM 12 includes the correlations or distancesd_(ji) between the n shifted input data vectors and the definitionvector for a single category j.

The system includes one DMM 12 for each category CD_(j) numbered from 1to N, each of which outputs n distances d_(ji). Thus, a total of M=n×Ndistances are input to the Best Match Module (BMM) 14.

FIG. 12 is a functional block diagram of the BMM 14. All of thedistances d₁₁ -d_(Nn) are input to a demultiplexer 52 and are selectedone at a time by counter 53 to appear at the output of the demultiplexer52 as d_(h). The index h of the counter 53 is incremented from zero upthrough M-1 to enable the distances d_(ji) one at a time to the outputof the demultiplexer 52.

RAM 50 and comparator 54 together serve as a max picker circuit. The RAM50 stores the present value of k, the number of the category having thehighest correlation with the input data pattern. The RAM 50 also storesthe actual distance d_(k) between that category and the input pattern.The present category number j and the current distance d_(h) beingcounted are applied to the RAM inputs. Distance d_(h) is compared atcomparator 54 to the present d_(k) stored at RAM 50. If d_(h) is greaterthan d_(k), the comparator 54 enables d_(h) to be written to the RAM 50to become the new d_(k), and j is written to the RAM 50 as the newcategory k having the closest correlation to the input. When all of thedistances d_(ji) have been processed, the outputs from the BMM 14 givethe number of the best match category k and the distance or correlationd_(k) between that category and the input training data. These outputsare forwarded to the Distance Decision Module (DDM) 16.

The DDM 16 is functionally diagrammed in FIG. 10. The DDM 16 compares atcomparator 56 the training threshold λ_(TRN) to the distance d_(k). Ifd_(k) is below the threshold, an Enable signal to add a new category istransmitted to the Add New Category Module (ANCM) 20. If the distanced_(k) is not less than the threshold λ_(TRN), then the category number kis enabled through latch 58 to the Category Adjust Module (CAN) 18.

FIG. 11 is a functional block diagram of the ANCM 20. The ANCM 20comprises a counter 60 which keeps track of the total number N ofcategories. If the ANCM 20 is enabled by the DDM 16, the counter isincremented and the new value of N is output to the category definitionRAM 10. This new value of N is used by the RAM 10 to point to the nextavailable RAM location for storage of a new category definition vector.When the ANCM 20 is enabled, it sends a Write Enable signal to the RAM10. It also enables a latch 62 to send the input data vector of thetraining data pattern to the RAM 10. Thus, the pattern is written to theRAM 10 in the next available location to define a new category.

FIG. 12 is a functional block diagram of the Category Adjust Module(CAM) 18. The CAM 18 receives as inputs the number k of the categorywhose definition is to be modified, the training input data, and thelearning rate α. The category definition to be modified is selected by kand is applied to the learning circuitry 72. The learning circuitry 72computes the learning equation

    I.sub.k.sup.NEW =I.sub.k.sup.OLD (1-α)+αI.sup.INP(6)

derived from equation (2), or, equivalently,

    CD.sub.k.sup.NEW =CD.sub.k.sup.OLD (1-α)+αI.sub.INP(7).

The training input data is multiplied by α at multiplier 74. Adder 75produces (1-α) which is multiplied by category definition CD_(k) atmultiplier 76. These two products are added together at adder 78, andthe result is applied to an input of multiplexer 80. The result ispassed to the appropriate output of multiplexer 80 by the categorynumber k applied to the select line of the multiplexer 80. The modifiedcategory definition CD_(k) is then replaced in the appropriate locationin category definition RAM 10.

After the training operation is complete, the system assigns labels tothe categories according to the classes of training patterns withintheir corresponding clusters. FIG. 13 is a functional block diagramwhich shows the category labeling configuration of the system. The DMMs12 and BMM 14 operate as they do in the training configuration togenerate the number k of the best match category and the correlation ordistance d_(k) between that category and the training input datapattern. These two outputs are applied to a Cluster Class AccumulatorModule (CCAM) 80. As each training pattern is input during the categorylabeling phase, the CCAM 80 keeps a count of the occurrences of eachclass within each individual category cluster.

FIG. 14 is a functional block diagram of the CCAM 80. Referring to bothFIG. 13 and FIG. 14, an input class identification is associated witheach training input data pattern. The class ID is applied to a ClassIndex Module 82 which generates a class index C which is applied to aninput of a demultiplexer circuit 84 within the CCAM 80. A second inputto the demux 84 is an unknown class indicator. Depending upon thecondition of the select line to the demux 84, either a known class indexor the unknown indicator will be forwarded to a Generate RAM AddressModule (GRAM) 86. Distance d_(k) is compared to the training thresholdλ_(TRN) at comparator 85. If d_(k) is greater than the trainingthreshold λ_(TRN), then the known class index is forwarded to the GRAM86. If d_(k) is below the threshold λ_(TRN), then an unknown indicatoris received by the GRAM 86.

The GRAM 86 also receives the category number k. The GRAM 86 uses theclass index C and the category number k to generate the address of anaccumulation register 88 which corresponds to the identified class andcategory. The count within that register is incremented, and the nexttraining pattern is processed. Thus, each accumulation register 88 keepstrack of the number of training patterns received for each class withineach category cluster. After the entire training data set has been thusprocessed, the accumulation registers 88 hold the total counts oftraining patterns for every class and every cluster.

Categories are assigned class names by the Category Label AssignmentModule (CLAM) 90, which is functionally diagrammed in FIG. 15. Counters92 and 94 increment through the number of categories and classesrespectively. Their outputs are forwarded to a Generate RAM AddressModule (GRAM) 96 which passes addresses to the accumulation registers 88to access the pattern counts stored there. Within each category cluster,the count of input patterns for each class are read from the registers88 one at a time. The count is passed to a total accumulator 98 whichkeeps track of the total number of patterns within the category cluster.The accumulator 98 is reset between clusters so that only the patternsreceived within a cluster are counted.

The count for the class being processed along with the class index C areapplied to a peak class RAM 100. The present peak count is applied to aninput of comparator circuit 102 along with the count from the registers88. If the count for the class being processed is greater than the peakcount, the comparator 102 generates a Write Enable signal which enablesthe count from the registers 88 to replace the peak count in the peakclass RAM 100. The present class index also replaces the old peak classindex C.

After all of the classes within a category have been read and theircounts have been accumulated, the cluster class contrast CC_(C) for thecategory is computed (see equation 3). The mean of the non-peak classpattern counts is computed by first subtracting the peak count from thetotal count at adder 108 and dividing the result by the number L-1 ofnon-peak classes in the category at divider 110. This result issubtracted from the number of patterns within the peak class at addingcircuit 112. Next, this result is divided by the total number ofpatterns in the cluster at divider 114. The resulting cluster classcontrast CC_(C) is compared at comparator 116 with the class contrastlabeling threshold λ_(L).

The output of the comparator 116 controls the select function of thedemultiplexer circuit 118. Depending on the condition of the selectline, either the class index C stored in the peak RAM 100 or the unknownindication is passed through the demux 118 to be assigned to thecategory k at the category label RAM 120. Thus, the RAM 120 identifiesthe category with either its peak class or an unknown indication. If thecluster class contrast CC_(C) is greater than the labeling thresholdλ_(L), then the peak class index C is passed through the demux 118 tothe RAM 120 and the category is thus labeled in accordance with its peakclass. If the class contrast CC_(C) does not exceed the threshold λ_(L),then the category is labeled as unknown.

The class ID of the input pattern is also forwarded to a Class RAMUpdate Module (CRUM) 130. The class RAM 132 keeps track of all of theclasses represented by the training input patterns. Whenever a new classof input pattern is received by the system, the class RAM 132 must beupdated to include the new class.

FIG. 16 is a functional block diagram of the CRUM 130 showing itsinteraction with the class RAM 132. Each of the existing class IDs inthe RAM 132 is applied to the inputs of a demultiplexer circuit 134. Acounter 136 increments from 1 to the number P of classes presentlystored in the class RAM 132. The output of a counter 136 controls theselect function of the demultiplexer 134 such that the class IDs appearone at a time at the output of the demux 134 for each count of thecounter 136. The output of the demux 134 and the present input class IDare applied to comparator 138. The input class ID is also applied to thenext available location (P+1) in the class RAM 132.

As the stored class IDs are output from the demux 134 one at a time,they are compared with the input class ID at comparator 138. If any ofthe stored class IDs matches the input class ID, the comparator 138generates a match signal which is stored in latch 140. The output of thelatch is applied to AND gate 142. As the counter 136 counts through thenumber of classes, the index of the counter is compared to the totalnumber of classes P at comparator 144. When the index J of the counterreaches the total number of classes P, the output signal from thecomparator 139 becomes active. The AND gate 142 generates a high WriteEnable signal if the counter reaches P and there has been no matchbetween the input class ID and any of the class IDs stored in the classRAM 132. Since the new class ID is applied to the next availablelocation in the class RAM 132, when the Write Enable signal becomesactive, the class ID is stored in the RAM 132 as a new class ID.

After the training and labeling operations described above are complete,the system is ready to classify input patterns during the testingoperation. FIG. 17 is a functional block diagram of the system in thetesting configuration. During testing, test input data is received bythe system at the DMMs 12. The DMMs 12 and the BMM 14 operate asdescribed above for the training operation to produce the categorynumber k and distance d_(k) of the best match category. The DMM 16compares the distance d_(k) to the testing threshold λ_(TEST). If d_(k)exceeds λ_(TEST), category number k is applied to the select line ofdemultiplexer circuit 150. Category labels from the category label RAM120 are applied to the inputs of the demux 150. The category number kselects the proper category label from the RAM 120. This label alongwith the identification of the best match category is output as thecategory decision. This forms the association between the test patternframe and a category.

As previously described, in the case of multiple-frame processing,multiple category associations will be formed which include the trainingpattern statistics for the best match category. These statistics, whichcan take the form of the training histogram previously described, arestored in the accumulation registers 88. To create the observationhistogram for multiple frames of testing data, the appropriate registercontents are combined in one of the manners previously described.

The pattern recognition system of the invention is also capable ofoperating in a multiple-channel mode or configuration. FIG. 18 is afunctional block diagram of the system in the multiple-channel testingconfiguration. In the multiple-channel configuration, each input datapattern is divided into N features labeled Feature 1, Feature 2, . . . ,Feature N. Each of the feature patterns is applied to an input of aMultiple-Channel Distance Metric Module (MCDMM) 200. Each MCDMM 200computes a correlation or distance d between the input data pattern anda category definition vector CD from the category definition RAM 210.The correlation is computed by each MCDMM 200 on a feature-by-feature orchannel-by-channel basis. That is, the correlation or distance d₁ forcategory 1 is computed by the MCDMM 200 by computing the individualfeature correlations between category definition vector CD₁ channels 1through N and features 1 through N, respectively, of the input datapattern. Distance d₁ is the maximum of the N individual featuredistances computed, or, equivalently, d₁ represents the worst or lowestfeature correlation calculated for category 1.

The distances are computed for n categories to generate distances d₁through d_(n). As in the single-channel configuration, these distancesare compared in the best match module 14 to select a best match categoryk based on the closest correlation. The identification of the best matchcategory k and the corresponding distance or d_(k) are forwarded to thedistance decision module (DDM) 16 which compares the distance d_(k) tothe training threshold λ_(TEST). If d_(k) exceeds λ_(TEST), categorynumber k is applied to the select line of the demultiplexer circuit orlabel input module 150. The appropriate category label from categorylabel RAM 120 is output as the class decision.

FIG. 19 is a functional block diagram showing the details of the MCDMM200. The individual distances or correlations between data inputfeatures 1 through N and category definition vector CD_(j) channels 1through N are computed by N distance metric modules (DMM) 12 asdescribed above in connection with FIG. 8A. Each DMM 12 compares asingle feature to a single channel in a category definition vector andoutputs a corresponding correlation or distance d.

Each distance d from the DMMs 12 is applied to an input of ademultiplexer circuit 226. The output of a counter 224 provides theselect input to the demultiplexer 226. The counter output starts at 1and increments through the number of channels or features N. At eachstep of the count, a computed distance d_(k) is applied to both amaximum distance RAM 222 and a comparitor 228. The maximum distance RAM222 keeps track of the maximum distance d_(l) of the distances outputfrom the demultiplexer 226. The output of the demultiplexer is appliedto the input of the RAM 222. The presently stored maximum distance d_(l)is applied to a second input of the comparitor 228. If the presentdistance d_(k) is greater than the presently stored maximum distanced_(l), then the Write Enable line to the RAM 222 becomes active to stored_(k) as the new maximum distance value d_(l). At the same time, theWrite Enable line of feature ID RAM 220 also becomes active to store theindex l of the counter 224 to identify the feature which resulted in themaximum distance. When the counter 224 reaches the total number ofchannels N, the maximum distance RAM 222 stores the maximum featurecorrelation d_(l), and the feature ID RAM 220 stores the feature indexl. These are both output from the MCDMM 200.

In the multiple-channel mode, the system can process any number ofchannels. However, each additional channel significantly increases theprocessing and storage requirements. It is therefore important to selectonly those features that impact the overall system performance. Thesystem provides a means of tracking the contribution of each featurewith a channel histogram. When comparing the multiple channels for agiven input with a particular category, the histogram bin thatcorresponds to the channel with the minimum correlation (maximumdistance) is incremented. After processing all of the inputs, the bin inthe histogram with the most counts indicates the channel and featurethat provide the most separation between classes. Bins with few or nocounts indicate which features can be eliminated without significantlyimpacting performance. Thus, the channel histogram is an importantsystem tool since it indicates the usefulness of each feature extractiontechnique under consideration.

The channel histogram can be a multi-dimensional plot. That is, thehistogram can be a three-dimensional surface above a plane whoseorthogonal axes are the category IDs k and channel IDs l. When a bin ata specific channel in a specific category is incremented, the height ofthe surface above the intersection of the channel and categoryincreases. After all the patterns are received, the peaks in the surfaceindicate particular features over all of the categories which are mostlikely to distinguish patterns from each other. The channel histogramcan also be one-dimensional, i.e., it can be used to identify the mostdistinctive feature for each individual category.

FIG. 20 is a block diagram of the Channel Accumulation Module (CAM) 250which generates the channel histogram. The CAM 250 receives from theMCDMM 200 the identification l of the feature for each category whichresulted in the selected maximum distance. The identification of thecorresponding cluster k is also received as an input. The two inputs areused to select a RAM address via a Generate RAM Address module 254. Foreach cluster k and maximum distance feature l, a correspondingaccumulation register in a Channel RAM 252 is incremented when addressedby the generate RAM address module 254. The values in the accumulationregisters are the bin counts of the channel histogram.

Although the multiple-channel training configuration is not shown in thedrawings, it will be readily understood that modifications to thesingle-channel mode to facilitate multiple-channel training are similarto those made in the testing configuration. That is, referring to FIG.7, in the multiple-channel training mode, each Distance Metric Module 12is replaced with a Multiple-Channel Distance Metric Module 200. Thetraining data input is separated into multiple feature inputs which areapplied to the MCDMMs 200. Also, each category definition CD in thecategory definition RAM includes multiple-channel definitions which areapplied to the MCDMMs 200. As in the testing configuration of FIG. 18,each feature of the input is applied to a channel from a categorydefinition. The correlation for each category is the minimum featurecorrelation computed by the MCDMM 200. The best match module 14 selectsthe closest correlation to determine the category to which an incomingtraining pattern should be added.

While this invention has been particularly shown and described withreferences to preferred embodiments thereof, it will be understood bythose skilled in the art that various changes in form and details may bemade therein without departing from the spirit and scope of theinvention as defined by the appended claims.

We claim:
 1. A pattern recognition system comprising:a memory thatstores a set of categories of training input patterns from a pluralityof subject classes, each training input pattern representing multiplefeatures of a subject, each category having a category definitionaccording to training input patterns within the category, each categorydefinition comprising a plurality of input pattern feature definitions,each category being associated with a training histogram of the traininginput patterns within the category and the training histogram includingcounts of training input patterns for each subject class within thecategory; and a classifier that receives during a testing operation atleast one test input pattern from the subject, that accesses the set ofcategories, that computes a correlation between a category definitionand the at least one test input pattern, that forms a categoryassociation between the at least one test input pattern and a categorybased on the correlation and that forms an observation histogram toclassify the subject, the observation histogram being formed from eachtraining histogram of each category of each category association, theobservation histogram containing counts of training input patterns inthe category, classification of the subject being determined by a peakclass of the observation histogram.
 2. The pattern recognition system ofclaim 1 wherein each category is assigned a label which is determined bythe peak class of the category.
 3. The pattern recognition system ofclaim 2 wherein the set of categories includes a category labeled asunknown such that the subject can be classified as unknown.
 4. Thepattern recognition system of claim 1 wherein the correlation betweenthe category definition and the at least one test input patterncomprises plural feature correlations between features of the at leastone test input pattern and feature definitions in the categorydefinition.
 5. The pattern recognition system of claim 4 wherein thecorrelation between the category definition and the at least one testinput pattern is based on one of the feature correlations.
 6. Thepattern recognition system of claim 1 further comprising a trainingsubsystem that receives the set of training input patterns during atraining operation and forms the set of categories.
 7. The patternrecognition system of claim 6 wherein, during the training operation, anexisting category definition vector is modified by multiplying alearning rate by a difference between an incoming training input patternand an existing category definition vector and adding that product tothe existing category definition vector.
 8. The pattern recognitionsystem of claim 7 wherein, for each category, the training subsystemcomputes a category class contrast indicative of the degree to which thepeak class of the category dominates the other classes of the category.9. The pattern recognition system of claim 8 wherein, if the categoryclass contrast exceeds a threshold, the peak class is assigned to thecategory as a label.
 10. The pattern recognition system of claim 8wherein, if the category class contrast is below a threshold, thecategory is labelled as unknown.
 11. The pattern recognition system ofclaim 1 further comprising a feature extraction subsystem for separatingeach input pattern into a plurality of feature representations.
 12. Thepattern recognition system of claim 6 further comprising:a secondtraining subsystem that receives a second plurality of training inputpatterns from the subject classes during the training operation and thatforms a second set of categories, each category of the second set havingassociated therewith a category definition and a training histogram; asecond classifier that receives a second test input pattern from thesubject, that computes a correlation between a category definition ofthe second set of categories and the second test input pattern and thatforms a second observation histogram; and a processor for combining thefirst and second observation histograms into a cumulative histogram toproduce a cumulative classification of the subject.
 13. A method ofpattern recognition comprising:forming a set of categories of aplurality of training input patterns from a plurality of subjectclasses, each training input pattern representing multiple features of asubject; generating a category definition for each category according tothe training input patterns within the category, each categorydefinition comprising a plurality of category feature definitions; foreach category, generating a training histogram of the training inputpatterns received within the category, the training histogram includingcounts of training input patterns received for each class within thecategory; receiving at least one test input pattern from a subjectduring a testing operation; computing a correlation between a categorydefinition and the at least one test input pattern; and forming acategory association between the at least one test input pattern and acategory based on the correlation; and forming an observation histogramto classify the subject, the observation histogram being formed fromeach training histogram of each category of each category associationand containing counts of training input patterns in the category,classification of the subject being determined by a peak class of theobservation histogram.
 14. The method of claim 13 further comprisinglabelling each category with a label determined by a peak class of thecategory.
 15. A method as recited in claim 14 wherein the set ofcategories includes a category labelled as unknown such that the subjectcan be classified as unknown.
 16. A method as recited in claim 13wherein the correlation between the category definition and the at leastone test input pattern comprises plural feature correlations betweenfeatures of the at least one test input pattern and feature definitionsin the category definition.
 17. A method as recited in claim 16 whereinthe correlation between the category definition and the at least onetest input pattern is based on one of the feature correlations.
 18. Amethod as recited in claim 13 wherein the step of generating a categorydefinition comprises modifying an existing category definition vector bymultiplying a learning rate by a difference between an incoming traininginput pattern and an existing category definition vector and adding thatproduct to the existing category definition vector.
 19. A method asrecited in claim 13 further comprising the step of computing a categoryclass contrast for each category indicative of the degree to which thepeak class of the category dominates the other classes of the category.20. A method as recited in claim 19 wherein, if the category classcontrast exceeds the threshold, the peak class is assigned to thecategory as a label for the category.
 21. A method as recited in claim19 wherein, if the category class contrast is below a threshold, acategory label of unknown is assigned to the category.
 22. A method asrecited in claim 13 further comprising separating each input patterninto a plurality of feature representations.
 23. A method as claimed inclaim 13 further comprising receiving a plurality of training inputpatterns from a plurality of subject classes during a training operationto form the set of categories.
 24. A method as recited in claim 23further comprising:receiving a second plurality of training inputpatterns from the subject classes during the training operation; forminga second set of categories, each category of the second set havingassociated therewith a category definition and a training histogram;receiving a second test input pattern from the subject; computing acorrelation between a category definition and the second test inputpattern; forming a second observation histogram; and combining the firstand second observation histograms into a cumulative histogram to producea cumulative classification of the subject.